Good practices
Read “Good enough practices in scientific computing” (Wilson et al. 2017). They talk about how to organize projects, why and how you should made your dataframes tidy, and how to make sure your work is readable, reproducible, and thus extensible.
If you expect you’ll be writing a lot of code, also read “Best practices for Scientific Computing” (Wilson et al. 2014), which goes more in depth about software engineering (encapsulation, version control, profiling, debugging, documenting, collaborating).
Taken mostly from those two papers, a checklist of good practices (and how they’re implemented in the lab):
- Collaboration
- Use GitHub for everything project-related.
- Create an overview of your project: write a good `README.md`
- Create a shared to-do list for the project: use GitHub issues.
- Open issues on GitHub liberally to discuss ideas, problems, and suggestions.
- Make the license explicit: we recommend the MIT licence.
- Data management
- Save the raw data: all raw datasets live on SCC. You can make temporary copies on one of the workstations, but be aware of the space limitations.
- Ensure the raw data is backed up: that’s why we’re using SCC, they handle all the backups.
- Record all the steps used to process data.
- Software
- Check into git (and GitHub) anything that has been written by hand by a human (except data tables).
- Write explanatory comments at the top of your scripts, with at least one example on how to use it.
- Keep changes small and commit often.
- Be ruthless about eliminating duplication.
- Make dependencies and requirements explicit (e.g.
requirements.txt
)
- Manuscripts
- Keep a running manuscript as you’re working on the project: we use a mix of Overleaf and Word
- Use a reference manager to automatically handle the bibliography: we recommend Zotero and Mendeley.
References
Wilson, Greg, D. A. Aruliah, C. Titus Brown, Neil P. Chue Hong, Matt Davis, Richard T. Guy, Steven H. D. Haddock, et al. 2014. “Best Practices for Scientific Computing.” Edited by Jonathan A. Eisen. PLoS Biology 12 (1): e1001745. https://doi.org/10.1371/journal.pbio.1001745.
Wilson, Greg, Jennifer Bryan, Karen Cranston, Justin Kitzes, Lex Nederbragt, and Tracy K. Teal. 2017. “Good Enough Practices in Scientific Computing.” Edited by Francis Ouellette. PLOS Computational Biology 13 (6): e1005510. https://doi.org/10.1371/journal.pcbi.1005510.